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1 Introduction 

This chapter introduces context algebras and demonstrates their application to 
combining logical and vector-based representations of meaning. Other chapters 
in this volume consider approaches that attempt to reproduce aspects of logical 
semantics within new frameworks. The approach we present here is different: 
We show how logical semantics can be embedded within a vector space frame- 
work, and use this to combine distributional semantics, in which the meanings of 
words are represented as vectors, with logical semantics, in which the meaning 
of a sentence is represented as a logical form. 

The ideas discussed here are present (at least implicitly) in earlier work, 
however we have introduced some notions which allow the mathematics to be 
tidied considerably: 

• When context algebras were introduced [3] they were applied only to func- 
tions from a free monoid A* to K. In fact, this construction generalises to 
functions from A* to an arbitrary vector space V. The proof of the general 
case is identical to the specific one, and is reproduced here unchanged. 

• This more general construction gives us an elegant way of embedding log- 
ical semantics within an algebraic framework. The embedding presented 
here follows similar lines to the thinking of [3 , but uses the new, more 
general, context algebras. 



The method of combining logical semantics with vector-based lexical se- 
mantics is new, but follows similar lines to an approach suggested in [3] . 



1.1 Motivation 

Like other work in this book, we are concerned with the question of how to 
compose vector-based representations of meaning so that phrases and sentences 
are also represented as vectors. We wish to preserve the wonderful flexibility 
and fine-grained distinctions of meaning that vector spaces allow, and which 
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have been so successful in lexical semantics, to build a complete framework for 
natural language semantics encompassing words, phrases, sentences and beyond. 

Unlike other work, in the approach presented here, we do not attempt to 
reconstruct logical semantics from scratch, instead embedding logical represen- 
tations within a vector space. This has some benefits: 

• Doing natural language semantics well is difficult, and a lot of work has 
gone into getting logical semantics for natural language right. It includes 
worrying about things like anaphora resolution, generalised quantifiers 
and negation, and reproducing this work from scratch in a vector-based 
framework is a mammoth task. Our approach allows us to reuse existing 
work while incorporating vector-based lexical semantics. 

• There is the potential to reuse existing tools for natural language seman- 
tics, although computation in general is a problem with our approach. 

The downside to our approach is that we don't yet have an efficient way of com- 
puting with it, although we have ideas for how this may be achieved. Another 
potential criticism of this approach is that the flexibility in how vector repre- 
sentations are combined with logic may be hindered by requiring the wholesale 
adoption of existing formalisms, rather than the more tailored approaches of 
other work. 

2 Theory of Meaning 

We first recall some basic definitions: 

Definition 1 (Algebra over a field). An algebra over a field is a vector space A 
over a field K together with a binary operation (a, b) i— > ab on A that is bilinear, 

(1) a(ab + fJc) = aab + /3ac 

(2) (aa + [3b)c = aac + f3bc 

for all a, 6, c € A and all a, f3 G K . If we additionally have the property (ab)c = 
a(bc) then A is called associative. An algebra is called unital if it has a 
distinguished unity element 1 satisfying lx = xl = x for all x € A. We are 
generally only interested in real associative algebras, where K is the field of real 
numbers, R. 

Examples of associative algebras are given by square matrices of order n 
under normal matrix multiplication and entry-wise vector operations. The field 
of the algebra is the field of the elements of the matrices; so real valued matrices 
form a real associative algebra. 

2.1 Meaning as Context 

The distributional hypothesis of Harris [5] states that words will have similar 
meanings if and only if they occur in similar contexts. We formalise this idea, 
and examine the resultant mathematical properties. 
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Let A be some set, which we imagine to be the set of words of a natural 
language. If V is a vector space, we define a general language for V (or 
simply a language when there is no ambiguity) as a function from the free 
monoid A* to V. For each string i £ A*, we have associated with it a vector in 
V that may have several interpretations: 

• V may simply be the real numbers R, and the language may describe 
a probability distribution over strings in A*, in which case we can view 
the language as a generative model of a natural language, describing the 
probability of observing each possible string as a sentence or a document. 

• V may be a vector space describing the meaning of strings, for example 
a representation of model-theoretic semantics. In this case, the language 
attaches a meaning to each possible string in A* . 

Given a general language L we define the context vector x of a string x as a 
function from A* x A* to V: 

x(y,z) = L(yxz) 

Thus, as in the study of formal languages, we consider the context of a string to 
be the pair of strings surrounding it. We think of x as an element of the vector 
space V A xA , the space of functions from A* x A* to V. This is a vector space 
with operations defined point- wise, i.e. if /, g G V A xA and a G K where K is 
the field of V then (af)(x,y) = af(x,y) and (/ + g){x,y) = f(x,y) + g(x,y) 
for all x, y G A*. 

Definition 2 (Generated Subspace A). The subspace A of V A xA is the set 

defined by 

(3) A = {a : a = ^ a x x for some a x G R} 

In other words, it is the space of all vectors formed from linear combinations of 
context vectors. 

Given this definition, we can define multiplication on A, by assuming linear- 
ity, and making the multiplication compatible with the underlying multiplication 
of A*. That is, we want to define a product • on A such that x ■ y = xy for 
all x,y e A*. However, in general there is more than one basis for A formed 
from elements x, for x G A*. We need to confirm that multiplication will be the 
same, regardless of which basis we choose. 

Proposition 1 (Context Algebra). Multiplication on A is the same irrespective 
of the choice of basis B. 

Proof. We say B C A* defines a basis B for A when B is a basis such that B = 
{x : x G B}. Assume there are two sets Bi,B 2 C A* that define corresponding 
bases B\ and Bi for A. We will show that multiplication in basis B\ is the same 
as in the basis Bi. 
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We represent two basis elements u\ and u 2 of B\ in terms of basis elements 
of B 2 : 



(4) u\ = /Jo^i and u 2 = /3jVj 

* 

for some Ui £ B±, Vj € -B2 and oti,f3j G K. First consider multiplication in the 
basis Bi. Note that ui = ^ ctiVi means that L(xu\y) = J2i onL(xviy) for all 
x, y G A* . This includes the special case where y — u 2 y' so 

(5) L(xu 1 u 2 y') = 2J a l L(xv l u 2 y') 

i 

for all x, ?/' ei*. Similarly, we have L{xu 2 y) — J2j PjL(xvjy) for all x,y E A* 
which includes the special case x = x'vi, so L(x'viU 2 y) = J2j PjL(x'viVjy) for 
all x',y £ A*. Inserting this into the above expression yields 

(6) L(xuiu 2 y) = 2J ai/3jL(xViVjy) 
for all x, y € A* which we can rewrite as 

(7) ui ■ u 2 = uiui = Oi/3j(vi ■ Vj) = aiPjViVj 

Conversely, the product of u\ and u 2 using the basis B 2 is 

(8) ui ■ u 2 = ^ a iVi ■ ^2 fifi'J = X! a ^i(^ ' 

thus showing that multiplication is defined independently of what we choose as 
the basis. □ 

Multiplication as defined above makes A an algebra, moreover it is easy to 
see that it is associative since the multiplication on A* is associative. It has a 
unity, which is given by e, where e is the empty string. 



2.2 Entailment 

Our notion of entailment is founded on the idea of distributional generality 

[5] . This is the idea that the distribution over contexts has implications not only 
for similarly of meaning, but can also describe how general a meaning is. A term 
ti is considered distributionally more general than another term t 2 if ti occurs 
in a wider range of contexts than t 2 . It is proposed that distributional generality 
may be connected to semantic generality. For example, we may expect the term 
animal to occur in a winder range of contexts than the term cat since the first 
is semantically more general. 

We translate this to a mathematical definition by making use of an implicit 
partial ordering on the vector space: 
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Definition 3 (Partially ordered vector space). A partially ordered vector space 
V is a real vector space together with a partial ordering < such that: 

if x < y then x + z < y + z 

if x < y then ax < ay 

for all x,y, z € V , and for all a > 0. Such a partial ordering is called a vector 
space order on V . An element u of V satisfying u > is called a positive 
element; the set of all positive elements of V is denoted V + . If < defines a 
lattice on V then the space is called a vector lattice or Riesz space. 

If V is a vector lattice, then the vector space of contexts, V A xA , is a vector 
lattice, where the lattice operations are defined component- wise: (u Av)(x,y) = 
u(x, y) A v(x, y), and («V»)(i, y) — u(x, y) \/v(x, y). For example, K is a vector 
lattice with meet as the min operation and join as max, so R" 4 xA is also a 
vector lattice. In this case, where the value attached to a context is an indication 
of its frequency of occurrence, x < y means that y occurs at least as frequently 
as x in every context. 

Note that, unlike the vector operations, the lattice operations are dependent 
on the basis: a different basis gives different operations. This makes sense in the 
linguistic setting, since there is nearly always a distinguished basis, originating 
in the contexts from which the vector space is formed. 

3 From logical forms to algebra 

Model-theoretic approaches generally deal with a subset of all possible strings, 
the language under consideration, translating sequences in the language to a log- 
ical form, expressed in another, logical language. Relationships between logical 
forms are expressed by an entailment relation on this logical language. 

This section is about the algebraic representation of logical languages. Rep- 
resenting logical languages in terms of an algebra will allow us to incorporate 
statistical information about language into the representations. For example, if 
we have multiple parses for a sentence, each with a certain probability, we will 
be able to represent the meaning of the sentence as a probabilistic sum of the 
representations of its individual parses. 

By a logical language we mean a language A C A'* for some alphabet A' , 
together with a relation h on A that is reflexive and transitive; this relation 
is interpreted as entailment on the logical language. We will show how each 
element u £ A can be associated with a projection on a vector space; it is these 
projections that define the algebra. Later we will show how this can be related 
to strings in the natural language A that we are interested in. 

For a subset T of a set 5*, we define the projection Pt on L°°(S) (the set of 
all bounded real- valued functions on S) by 
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Where e s is the basis element of L°°(S) corresponding to the element s £ S. 
Given u G A , define |h(u) ~ {v : v \- u}. 

As a shorthand we write P u for the projection f| h ( u ) on the space L°°(A). 
The projection P u can be thought of as projecting onto the space of logical 
statements that entail u. This is made formal in the following proposition: 

Proposition 2. P u < P v if and only ifuhv. 

Proof. Recall that the partial ordering on projections is defined by P u < P v if 
and only if P U P V = P V P U = P u g]. Clearly 

„ „ J e w if whu and w h v 

PuP v e w - | otherwise 

so if u h v then since h is transitive, if tu h it then w h « so we must have 
p p _ p p _ p 

Conversely, if P U P V — P u then it must be the case that iiihii implies aihs 
for all w G A, including w = u. Since h is reflexive, we have u h u, so u h w 
which completes the proof. □ 

To help us understand this representation better, we will show that it is 
closely connected to the ideal completion of partial orders. Define a relation = 
on A by u = v if and only if u h v and uhii. Clearly = is an equivalence relation; 
we denote the equivalence class of u by [it] . Equivalence classes are then partially 
ordered by [it] < [v] if and only if it h v. Then note that [j J>(M) = [h(u) , 
thus P u projects onto the space generated by the basis vectors corresponding 
to the elements (J |h(M) , the ideal completion representation of the partially 
ordered equivalence classes. 

What we have shown here is that logical forms can be viewed as projections 
on a vector space. Since projections are operators on a vector space, they are 
themselves vectors; viewing logical representations in this way allows us to treat 
them as vectors, and we have all the flexibility that comes with vector spaces: 
we can add them, subtract them and multiply them by scalars; since the vector 
space is also a vector lattice, we also have the lattice operations of meet and 
join. As we will see in the next section, in some special cases such as that 
of the propositional calculus, the lattice meet and join coincide with logical 
conjunction and disjunction. 

3.1 Example: Propositional Calculus 

In this section we apply the ideas of the previous section to an important special 
case: that of the propositional calculus. We choose as our logical language A 
the language of a propositional calculus with the usual connectives V, A and 
the logical constants T and _L representing "true" and "false" respectively, with 
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uh» meaning "infer v from u" , behaving in the usual way. Then: 

P P P 

1 uAv * u- 1 v 

P^u = l-Pu+P± 

Pt = 1 

To see this, note that the equivalence classes of h form a Boolean algebra under 
the partial ordering induced by h, with 

[it A v] = [u] A [u] 

[«V»] = [u] V [u] 
[-.«] =-■[«]. 

Note that while the symbols A, V and -i refer to logical operations on the left 
hand side, on the right hand side they are the operations of the Boolean algebra 
of equivalence classes; they are completely determined by the partial ordering 
associated with h0 

Since the partial ordering carries over to the ideal completion we must have 

[[uAv] = J>] n J>] 
l[uvv] = I[u]uj>] 

Since u h T for all u G A, it must be the case that JjT] contains all sets 
in the ideal completion. However the Boolean algebra of subsets in the ideal 
completion is larger than the Boolean algebra of equivalence classes; the latter is 
embedded as a Boolean sub-algebra of the former. Specifically, the least element 
in the completion is the empty set, whereas the least element in the equivalence 
class is represented as |[_L] . Thus negation carries over with respect to this 
least element: 

M = ([T]-M)u[±]. 

We are now in a position to prove the original statements: 

• Since J_[T] contains all sets in the completion, Uj.lT] = J>(T) = A, and 
Pt must project onto the whole space, that is Pt = 1- 

• Using the above expression for [u A v] , taking unions of the disjoint sets 
in the equivalence classes we have J>(uAu) = J>(u) H ]>(i>)- Making 
use of the equation in the proof to Proposition [21 we have P u av = PuPv ■ 

• In the above expression for I [-iu] , note that I [T] C | [u] C |[1] . This 
allows us to write, after taking unions and converting to projections, P-, u — 
1-P U + P± , since P T = 1. 

1 In the context of model theory, the Boolean algebra of equivalence classes of sentences of 
some theory T is called the Lindenbaum-Tarski algebra of T [6] . 
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• Finally, we know that u V v = -i(—iu A and since equivalent elements 
in A have the same projections we have 



P 



— 1 — P-mA-iti + Pi 
= l-P^P^+Pj_ 




It is also worth noting that in terms of the vector lattice operations V and A on 
the space of operators on L°°(A), we have P u w = Pu VP„ andP^At; = P„AP„. 

3.2 Prom logic to context algebras 

In the simplest case, we may be able to assign to each natural language sen- 
tence a single sentence in the logical language (its interpretation). Let r C A* 
be a formal language consisting of natural language sentences x with a corre- 
sponding interpretation in the logical language A, which we denote p{x). The 
function p maps natural language sentences to their interpretations, and may 
incorporate tasks such as word-sense disambiguation, anaphora resolution and 
semantic disambiguation. 

We can now define a general language to represent this situation. We take as 
our vector space V the space generated by projections {P„ : u G A} on L°°(A). 
For x G A* we define 



Given the discussions in the preceding sections, it is clear that for x, y G T, 
L(x) < L(y) if and only if p(x) h p(y), so the partial ordering of the vector 
space encodes the entailment relation of the logical language. 

The context algebra constructed from L gives meaning to any substring 
of elements of T, so any natural language expression which is a substring of 
a sentence with a logical interpretation has a corresponding non-zero element 
in the algebra. If T has the property that no element of T is a substring of 
any other element of T (for example, T consists of natural language sentences 
starting and ending with a unique symbol) , then we will also have x < y if and 
only if p{x) h p(y), for all x, y G T. In this case, the only context which maps 
to a non-zero vector is the pair of empty strings, (e, e). 

3.3 Incorporating Word Vectors 

The construction of the preceding section is not very useful on its own, as it 
merely encodes the logical reasoning within a vector space framework. However, 
we will show how this construction can be used to incorporate vector-based 
representations of lexical semantics. 
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In the general case, we assume that associated with each word a S A there is 
a vector ip(a) in some finite-dimensional vector space L°°(S) that represents its 
lexical semantics, perhaps obtained using latent semantic analysis or some other 
technique. S is a finite set indexing the lexical semantic vector space which we 
interpret as containing aspects of meaning. This set may be partitioned into 
subsets containing aspects for different parts of speech, for example, we may 
wish that the vector for a verb is always disjoint to the vector for a noun. 
Similarly, there may be a single aspect for each closed class or function word in 
the natural language, to allow these words not to have a vector nature. 

Instead of mapping directly from strings of the natural language to the logical 
language, we map from strings of aspects of meaning. Let A C S* be a formal 
language consisting of all meaningful strings of aspects, i.e. those with a logical 
interpretation. As before, we assume a function p from A to a logical language 
A. The corresponding context algebra A describes composition of aspects of 
meaning. 

We can now describe the representation of the meaning a of a word a e A 
in terms of elements of A: 



thus a term is represented as a weighted sum of the context vectors for its 
aspects. Composition of A together with the distributivity of the algebra is 
then enough to define vectors for any string in A* . For x e A* , wc define x by 



where x = Xix 2 ■ ■ ■ x n for x n G A. 

This construction achieves the goal of combining vector space representa- 
tions of lexical semantics with existing logical formalisms. It has the following 
properties: 

• The entailment relation between the logical expressions associated with 
sentences is encoded in the partial ordering of the vector space. 

• The vector space representation of a sentence includes a sum over all 
possible logical sentences, where each word has been represented by one 
of its aspects. 

Discussion 

This second property is actually inconsistent with the idea that distributional 
generality determines semantic generality. To see why, consider the sentences 

1. No animal likes cheese 

2. No cat likes cheese 

Although the first sentence entails the second, the term animal is more general 
than cat; the quantifier no has reversed the direction of entailment. The dis- 
tributivity of the algebra means that if the term animal occurs in a wider range 




x = X\ ■ X2 ■ ■ ■ x. 



n 
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of contexts than cat (i.e. it is distributionally more general), then this generality 
must persist through to the sentence level as terms are multiplied. 
There are two ways of viewing this inconsistency: 

1. The construction is incorrect because the distributional hypothesis does 
not hold universally at the lexical level. In fact, this idea is justified by 
our example above, in which the more general term, animal, would be 
expected to occur in a smaller range of contexts than cat when preceded 
by the quantifier no. Under this view, the construction needs to be altered 
so that quantifiers such as no can reverse the direction of entailment. In 
fact, for this to be the case, we would need to dispense with using an 
algebra altogether, since distributivity is a fundamental property of the 
algebra. 

2. The construction is correct as long as the "aspects" of words are really 
their senses, so the vector space nature only represents semantic ambiguity 
and not semantic generality. In reality, aspects may be more subtle than 
what is normally considered a word sense, and the vector space represen- 
tation would capture this subtlety. In this view, distributional generality 
is correlated with semantic generality at the sentence level, but not nec- 
essarily the word level. Vector representations describe only semantic 
ambiguity and not generality, which should be incorporated into the asso- 
ciated logical representation. This would mean that the algorithms used 
for obtaining vectors for terms would have to have more in common with 
automatic word sense induction than the more general semantic induction 
associated with the distributional hypothesis. 

An alternative solution for this type of quantifier (another example is all) 
and negation in general, is to use a construction similar to that proposed in 
[7], which uses Bell states to swap dimensions in a similar manner to qubit 
operators. 

3.4 Partial Entailment 

In general, it is unlikely that any two strings x, y £ A* which humans would 
judge as entailing would have vectors such that x < y because of the nature 
of the automatically obtained vectors. Instead, it makes sense to consider a 
degree of entailment between strings. In this case, we assume we have a linear 
functional <j> on V A xA . The degree to which x entails y is then given defined 
as 

<Mg A v) 

m 

This has many of the properties we would expect from a degree of entailment. 
In certain cases <j> can be used to make the space an Abstract Lebesgue space pQ , 
in which case we can interpret the above definition as a conditional probability. 
This idea, and methods of defining (j> are discussed in detail in [3] . 
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For the vector space V generated by projections on L°°(A), one possible 
definition of (j> would be given by 

<P(u)=J2P(l)\\u(e,e)e l \\ 
zeA 

where P(l) is some probability distribution over elements of A. The interpreta- 
tion of this is: all contexts except those consisting of a pair of empty strings are 
ignored (so only strings that have logical interpretations make a contribution to 
the value of the linear functional); the value is then given by summing over all 
strings of the logical language and multiplying the probability of each string I 
by the size of the vector resulting from the action of the operator on the basis 
element corresponding to I. The probability distribution over A needs to be 
estimated, this could perhaps be done using machine learning: 

• Given a corpus, build a model for each sentence in the corpus and its 
negation using a model builder 

• Train a support vector machine using the models for the sentence and its 
negation as the two classes. This requires defining a kernel on models. 

• Given a new string I £ A, build a model, then use the probability estimate 
of the support vector machine for the model belonging to the positive 
class, together with a normalisation function. 

3.5 Computational Issues 

In general it will be very hard to compute the degree of entailment between two 
strings using the preceding definitions. The number of logical interpretations 
that need to be considered increases exponentially with the length of the string. 
One possible way of tackling this would be to use Monte-Carlo techniques, for 
example, sampling dimensions of the vector space and computing degrees of 
entailment for the sample. 

A more principled approach would be to exploit symmetries in the mapping 
p from strings of aspects to the logical language. A further possibility is that a 
deeper analysis of the algebraic properties of context algebras leads to a simpler 
method of computation. Further work is undoubtedly necessary to tackle this 
problem. 

4 Conclusion 

We have introduced a more general definition of context algebras, and shown 
how they can be used to combine vector-based lexical semantics with logic based 
semantics at the sentence level. Whilst computational issues remain to be re- 
solved, our approach allows the reuse of the abundance of work in logic-based 
natural language semantics. 
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